Skip to content

fix: align sync status heuristic with leanSpec#417

Merged
pablodeymo merged 7 commits into
lambdaclass:mainfrom
dicethedev:fix/sync-status-heuristic
Jun 11, 2026
Merged

fix: align sync status heuristic with leanSpec#417
pablodeymo merged 7 commits into
lambdaclass:mainfrom
dicethedev:fix/sync-status-heuristic

Conversation

@dicethedev

Copy link
Copy Markdown
Contributor

🗒️ Description / Motivation

  • Updates the heuristic used by lean_node_sync_status to follow leanSpec PR feat: add validator sync-lag duty gate leanEthereum/leanSpec#708.
  • The previous heuristic considered the node synced only when its head reached the current slot.
  • This provides a sufficiently stable sync signal for a follow-up PR to gate validator duties.

What Changed

  • Added the leanSpec sync-lag thresholds:
    • 4-slot local sync-lag threshold.
    • 8-slot network-stall threshold.
    • 2-slot hysteresis band.
  • Added a stateful sync-status tracker to the blockchain actor.
  • Refreshes sync status on every tick.
  • Uses the freshest validated live-chain block to distinguish local lag from a network-wide stall.
  • Added tests for threshold boundaries, network stalls, hysteresis, and future-head clock skew.

Correctness / Behavior Guarantees

  • A node is marked syncing when its head is more than four slots behind while fresher validated blocks are known.
  • Network-wide stalls do not mark the node as syncing.
  • Once syncing, the node must recover to two slots behind before returning to synced.
  • Slot differences use saturating subtraction to handle future-head clock skew.
  • This PR only updates the metric heuristic. It does not yet skip proposals or attestations.

Tests Added / Run

  • Added unit tests covering:
    • Lag at and below the threshold.
    • Lag above the threshold.
    • Network-wide stalls.
    • Hysteresis during recovery.
    • Clock skew with a future head.
  • Ran cargo fmt --all -- --check.
  • Ran git diff --check.
  • Full tests and lint remain to be run after the pinned leanVM dependency is available.

Related Issues / PRs

✅ Verification Checklist

  • Ran make fmt — clean
  • [ x] Ran make lint (clippy with -D warnings) — clean
  • Ran cargo test --workspace --release — all passing

@greptile-apps

greptile-apps Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the simplistic "head == current slot" sync heuristic with a stateful SyncStatusTracker that follows leanSpec PR #708, adding lag thresholds, network-stall detection, and hysteresis to prevent flapping.

  • Introduces SYNC_LAG_THRESHOLD (4), NETWORK_STALL_THRESHOLD (8), and SYNC_HYSTERESIS_BAND (2) constants plus a SyncStatusTracker struct updated on every tick (5x per slot at 800 ms each).
  • Moves sync-status updates from process_block to on_tick so the metric refreshes even when no new blocks arrive; adds unit tests covering threshold boundaries, hysteresis, network stalls, and clock skew.

Confidence Score: 4/5

Safe to merge for the stated observability goal; the state-machine logic follows leanSpec PR #708 and is well-covered by the new unit tests.

The state-machine logic and test coverage are solid. Two small issues are worth tidying before this feeds into validator-duty gating: the live-chain full-HashMap allocation on every 800 ms tick, and the bare unsigned subtraction in the hysteresis recovery condition which would silently break hysteresis in a release build if the constants are ever reordered.

crates/blockchain/src/lib.rs — the update_sync_status helper and the hysteresis recovery expression on line 89.

Important Files Changed

Filename Overview
crates/blockchain/src/lib.rs Core change: adds SyncStatusTracker with correct threshold/hysteresis logic; two minor issues — full HashMap allocation on every 800 ms tick and an unguarded unsigned subtraction in the recovery condition.
crates/blockchain/src/metrics.rs Adds #[derive(Debug, Clone, Copy, PartialEq, Eq)] to SyncStatus to support equality assertions in the new unit tests; no logic changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([Every Tick 800ms]) --> B{network_lag > 8?}
    B -- Yes --> C[syncing = false]
    B -- No --> D{currently syncing?}
    D -- Yes --> E{head_lag > 2?}
    E -- Yes --> F[syncing = true]
    E -- No --> G[syncing = false]
    D -- No --> H{head_lag > 4?}
    H -- Yes --> I[syncing = true]
    H -- No --> J[syncing = false]
    C & G & J --> K([Synced])
    F & I --> L([Syncing])
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/blockchain/src/lib.rs:721-727
`get_live_chain()` collects the entire `LiveChain` table into a `HashMap<H256, (u64, H256)>` only to compute the maximum slot. This allocation runs 5 times per slot (once per 800 ms tick). During finality-delay events the live chain can grow to hundreds of entries. Adding a dedicated `Store::max_live_chain_slot()` that iterates without collecting would avoid the allocation entirely.

```suggestion
        let max_seen_slot = self
            .store
            .max_live_chain_slot()
            .unwrap_or(head_slot);
```

### Issue 2 of 2
crates/blockchain/src/lib.rs:89
The expression `SYNC_LAG_THRESHOLD - SYNC_HYSTERESIS_BAND` is a bare `u64` subtraction. In a debug build this would panic if `SYNC_HYSTERESIS_BAND` were ever raised above `SYNC_LAG_THRESHOLD`; in a release build it silently wraps to a huge value, making the condition `head_lag > u64::MAX - delta` always false — the syncing flag would reset on the very next tick regardless of actual lag, silently breaking hysteresis. Using `saturating_sub` costs nothing and makes the intent explicit.

```suggestion
            self.syncing = head_lag > SYNC_LAG_THRESHOLD.saturating_sub(SYNC_HYSTERESIS_BAND);
```

Reviews (1): Last reviewed commit: "fix: align sync status heuristic with le..." | Re-trigger Greptile

Comment thread crates/blockchain/src/lib.rs Outdated
Comment thread crates/blockchain/src/lib.rs Outdated
dicethedev and others added 2 commits June 6, 2026 06:08
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

@pablodeymo pablodeymo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is failing.
Can you please solve those issues?
thanks!

@MegaRedHand MegaRedHand left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some conflicts

syncing: bool,
}

impl SyncStatusTracker {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to another module?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the impl SyncStatusTracker?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the whole SyncStatusTracker and related constants. Also the tests. I want to make this file lighter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open a PR for this to fix this.

@pablodeymo pablodeymo merged commit 7208707 into lambdaclass:main Jun 11, 2026
2 checks passed
pablodeymo added a commit that referenced this pull request Jun 12, 2026
## 🗒️ Description / Motivation

- Prevents validators from proposing blocks or producing attestations
while their local view is syncing.
- Signing with a stale local view can place proposals and attestations
on a chain the network has already left.
- Builds on the leanEthereum/leanSpec#708 sync heuristic added in the
preceding PR (#417 ).

## What Changed

- Gate scheduled block proposals using the node’s sync status.
- Report `has_proposal = false` to the store when a proposal is skipped.
- Gate attestation production at interval 1.
- Log skipped proposal and attestation duties.
- Keep block processing, fork choice, aggregation, metrics, and key
advancement active while syncing.
- Add tests covering gating, recovery, and network-wide stalls.

## Correctness / Behavior Guarantees

- Proposals and attestations are disabled only while the local node is
considered syncing.
- Duties resume once the node catches up past the hysteresis recovery
boundary.
- Duties remain enabled during a network-wide stall so validators can
help the chain recover.
- Incoming network data and synchronization processing remain
unaffected.
- A skipped proposal is not reported to fork choice as locally
available.

## Tests Added / Run

- Added tests verifying:
  - Proposals and attestations are gated while syncing.
  - Duties resume after catching up.
  - Duties remain enabled during network-wide stalls.
- Ran `cargo test -p ethlambda-blockchain --lib --offline`.
  - 31 passed; 0 failed.
- Ran `cargo clippy -p ethlambda-blockchain --lib --offline -- -D
warnings`.
- Ran `cargo fmt --all -- --check`.
- Ran `git diff --check`.

## Related Issues / PRs

- Closes #236
- Depends on the sync-status heuristic PR.
- Follows leanEthereum/leanSpec#708.

## ✅ Verification Checklist

- [x] Ran `make fmt` — clean
- [x] Ran `make lint` (clippy with `-D warnings`) — clean
- [ ] Ran `cargo test --workspace --release` — all passing

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Pablo Deymonnaz <pdeymon@fi.uba.ar>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants